Linguistically debatable or just plain wrong?

نویسندگان

  • Barbara Plank
  • Dirk Hovy
  • Anders Søgaard
چکیده

In linguistic annotation projects, we typically develop annotation guidelines to minimize disagreement. However, in this position paper we question whether we should actually limit the disagreements between annotators, rather than embracing them. We present an empirical analysis of part-of-speech annotated data sets that suggests that disagreements are systematic across domains and to a certain extend also across languages. This points to an underlying ambiguity rather than random errors. Moreover, a quantitative analysis of tag confusions reveals that the majority of disagreements are due to linguistically debatable cases rather than annotation errors. Specifically, we show that even in the absence of annotation guidelines only 2% of annotator choices are linguistically unmotivated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning part-of-speech taggers with inter-annotator agreement loss

In natural language processing (NLP) annotation projects, we use inter-annotator agreement measures and annotation guidelines to ensure consistent annotations. However, annotation guidelines often make linguistically debatable and even somewhat arbitrary decisions, and interannotator agreement is often less than perfect. While annotation projects usually specify how to deal with linguistically ...

متن کامل

Misinformation in News Coverage of Professional and College Athlete Musculoskeletal Ailments

Background: The general population’s understanding of musculoskeletal health is likely influenced by media reports of the ailments of prominent athletes. We assessed factors independently associated with debatable or potentially misleading medical statements in mainstream sports media coverage of the ailments of professional and college athletes.Methods: We identified and assessed 200 Int...

متن کامل

I. Electronic Voting Systems—Is Brazil Ahead of its Time?

The first article, by Professor Pedro Rezende of the University of Brasilia, describes the political context for the introduction of voter-verifiable paper ballots to their DRE (direct-record electronic, or touch-screen) voting machines for their 2002 elections. Rezende argues that many of the criticisms levelled against voter-verifiable paper ballots, such as the criticism that voter-verifiabl...

متن کامل

System Evaluation and Assurance 26.1 Introduction

I’ve covered a lot of material in this book, some of it quite difficult. But I’ve left the hardest parts to the last. These are the questions of assurance — whether the system will work — and evaluation — how you convince other people of this. How do you make a decision to ship the product, and how do you sell the safety case to your insurers? Assurance fundamentally comes down to the question ...

متن کامل

The Average American has 2.3 Children

Average-NPs, such as the one in the title of this paper, have been claimed to be 'linguistically identical' to any other definite-NPs but at the same time to be 'semantically inconsistent' with these other definite-NPs. To some this is an ironclad proof of the irrelevance of semantics to linguistics. We argue that both of the initial claims are wrong: average-NPs are not 'linguistically identic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014